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Abstract 

This work relates the framework of model-based clustering for spatial functional 
data where the data are surfaces. We first introduce a Bayesian spatial spline re¬ 
gression model with mixed-effects (BSSR) for modeling spatial function data. The 
BSSR model is based on Nodal basis functions for spatial regression and accom¬ 
modates both common mean behavior for the data through a fixed-effects part, 
and variability inter-individuals thanks to a random-effects part. Then, in order 
to model populations of spatial functional data issued from heterogeneous groups, 
we integrate the BSSR model into a mixture framework. The resulting model is 
a Bayesian mixture of spatial spline regressions with mixed-effects (BMSSR) used 
for density estimation and model-based surface clustering. The models, through 
their Bayesian formulation, allow to integrate possible prior knowledge on the data 
structure and constitute a good alternative to recent mixture of spatial spline re¬ 
gressions model estimated in a maximum likelihood framework via the expectation- 
maximization (EM) algorithm. The Bayesian model inference is performed by 
Markov Chain Monte Carlo (MCMC) sampling. We derive two Gibbs sampler to 
infer the BSSR and the BMSSR models and apply them on simulated surfaces and 
a real problem of handwritten digit recognition using the MNIST data set. The ob¬ 
tained results highlight the potential benefit of the proposed Bayesian approaches 
for modeling surfaces possibly dispersed in particular in clusters. 

key-words: Bayesian spatial spline regression; Bayesian mixture of spatial spline regres¬ 
sion; Surface approximation; Model-based surface clustering; Gibbs sampling; Spatial 
functional data analysis; Handwritten digit recognition. 

1 Introduction 

Functional data analysis (FDA) (Ramsay and Silverman, 2005, 2002; Ferraty and Vieu, 
2006) is the paradigm of data analysis in which the individuals are functions (e.g., curves 
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or surfaces) rather than vectors of reduced dimension. Most of the classical analyses di¬ 
rectly consider the data to be analyzed as vectors. However, in many areas of application, 
including signal and image processing, functional imaging, handwritten text recognition, 
genomics, diagnosis of complex systems, etc., the analyzed data are often available in the 
form of (discretized) values of functions or curves (e.g., times series, waveforms, etc) and 
surfaces (2D-images, spatio-temporal data, etc) which makes them very structured. This 
“functional" aspect of the data adds additional difficulties in the the analysis compared 
to the case of a classical multivariate analysis. It is fortunately possible to overcome 
these difficulties encountered in multivariate (non functional) analysis techniques, by 
formulating “functional” models that explicitly integrate the functional form of the data, 
rather than directly considering them as vectors. This is the FDA framework for data 
clustering, classification and regression. The key tenet of FDA is to treat the data not 
just as multivariate observations but as (discretized) values of smooth functions. This 
approach allows to more fully exploit the structure of the data. In this framework, 
several models have been introduced to model univariate and multivariate functional 
data for clustering or classification. Among these models, one distinguishes the finite 
mixture model-based ones, on which we focus in this paper. Indeed, the flexibility, 
easy interpretation and efficiency of finite mixture models (McLachlan and Peel., 2000; 
Friihwirth-Schnatter, 2006; Titterington et ah, 1985) in multivariate analysis, has lead 
to a growing investigation for adapting them to the framework of FDA. For example, 
one can cite the following papers, among many others, which relate probabilistic gen¬ 
erative models for FDA (Devijver, 2014; Jacques and Preda, 2014; Chamroukhi et ah, 
2013; Delaigle et ah, 2012; Bouveyron and Jacques, 2011; Same et ah, 2011; Chamroukhi 
et ah, 2010; Chamroukhi, 2010; Chamroukhi et ah, 2009; Liu and Yang, 2009; Gaffney 
and Smyth, 2004; Gaffney, 2004; James and Sugar, 2003; James and Hastie, 2001). 

These models have however mainly focused on the study of univariate or multivariate 
functions. For the case of spatial functional data, Malfait and Ramsay (2003); Ramsay 
et ah (2011); Sangalli et ah (2013); Nguyen et ah (2014) proposed methods to deal with 
surfaces. In particular, the recent approach proposed by Nguyen et ah (2014) for clus¬ 
tering and classification of surfaces is based on the regression spatial spline regression 
as in Sangalli et ah (2013) in a mixture of linear mixed-effects model framework as in 
(Celeux et ah, 2005). Nguyen et ah (2014) indeed extended the functional data analysis 
framework for univariate functions to the analysis of spatial functions (i.e. surfaces) 
by introducing a spatial spline regression (SSR) model and a mixture of spatial spline 
regressions (MSSR) model, to respectively model homogeneous surfaces and heteroge¬ 
neous surfaces with a clustering structure. The SSR model with mixed-effects is tailored 
to spatial regression data with both fixed-effects and random-effects. The mixture of 
spatial spline regression (MSSR) is dedicated to surface clustering, as in (James and 
Sugar, 2003) for curve clustering, while the mixture of spatial spline regression discrim- 
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inant analysis (MSSR-DA) is deditcated to curve discrimination, in a similar way as 
the discriminant analysis approach for curves proposed by James and Hastie (2001). 
The usual used tool for model estimation is maximum likelihood estimation (MLE) by 
using the expectation-maximization (EM) algorithm (McLachlan and Krishnan, 2008; 
Dempster et ah, 1977). While MLE via the EM algorithm is the standard way to fit 
finite mixture-based models, a common alternative is the Bayesian inference, that is, 
the maximum a posteriori (MAP) estimation by using in general Markov Chain Monte 
Carlo (MCMC) sampling. 

Indeed, the Bayesian inference framework has also led to intensive research in the 
hied of mixture models and Bayesian methods for mixtures have become popular due 
to advances in both methodology and computing power. The application of Bayesian 
methods to mixture models are included namely in Robert (1994), and Andrew Gelman 
and Rubin (2003). Some key papers on the Bayesian analysis of mixtures are Diebolt and 
Robert (1994), Escobar and West (1994) and Richardson and Green (1997) and Celeux 
et al. (2000). One can cite for example the following references among many others that 
deal with Bayesian mixture modeling and inference: (Robert, 1994; Stephens, 1997; 
Bensmail et ah, 1997; Ormoneit and Tresp, 1998; Stephens, 2000; Marin et ah, 2005; 
Fruhwirth-Schnatter, 2006; Fraley and Raftery, 2007) 

While the MLE approaches maximizes the model likelihood, the Bayesian (MAP) 
approach maximizes adds a prior distribution over the model parameters and then max¬ 
imizes the posterior parameter distribution. The MAP estimation can still be performed 
by the EM algorithm (namely in the case of conjugate priors) as in Fraley and Raftery 
(2007) or by MCMC sampling, such as the Gibbs sampler Neal (1993); Raftery and 
Lewis (1992,?); Bensmail et ah (1997); Marin et ah (2005); Robert and Casella (2011). 
For the Bayesian analysis of regression data, Lenk and DeSarbo (2000) introduced a 
Bayesian inference for finite mixtures of generalized linear models with random effects. 
Int their mixture model, each component is a regression model with a random-effects 
parts and the model is dedicated to multivariate regression data. 

In this paper, we present a probabilistic Bayesian formulation to model spatial func¬ 
tional data by extending the approaches of Nguyen et ah (2014) and apply the proposal 
to surface approximation and clustering. The model is also related to the random-effects 
mixture model of Lenk and DeSarbo (2000) in which we explicitly add mixed-effects and 
derive it for spatial functional data by using the Nodal basis functions (NBFs). The 
NBFs (Malfait and Ramsay, 2003) used in Ramsay et ah (2011); Sangalli et ah (2013); 
Nguyen et ah (2014) represent an extension of the univariate B-spline bases to bivariate 
surfaces. We thus introduce the Bayesian spatial spline regression with mixed-effects 
(BSSR) for fitting a population of homogeneous surfaces and the Bayesian mixtures of 
SSR (BMSSR) for fitting populations of heterogeneous surfaces organized in groups. 
The BSSR model is first applied in surface approximation. Then, the BMSSR model 
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is applied in model-based surface clustering by considering the real-world handwritten 
digits from the MNIST data set (LeCun et ah, 1998). 

This paper is organized as follows. Section 2 provides a description of recent related 
work on mixture of spatial spline regressions. Then, in Section 3, we present the BSSR 
model and its inference technique using Gibbs sampling. Then, in Section 4, we present 
the Bayesian mixture formulation, that is, the BMSSR model, and show how to apply 
it in model-based clustering of surfaces. A Gibbs sampler is derived to estimate the 
BMSSR model parameters. In section 5, we apply the proposed models on simulated 
surfaces and on a real handwritten digit recognition problem. Finally, in Section 6, we 
draw some conclusions and mention some future possible directions for this research. 

2 Mixtures of spatial spline regressions with mixed- 
effects 

This section is dedicated to related work on mixture of spatial spline regressions (SSR) 
with mixed-effects (MSSR), introduced by Ng and McLachlan (2014). We first describe 
the regression model with linear mixed-effects and its mixture formulation, in the general 
case, and then describe the models for spatial regression data. 

2.1 Regression with mixed-effects 

The miexd-effects regression models (see for example Laird and Ware (1982), Verbeke 
and Lesaffre (1996) and Xu and Hedeker (2001)), are appropriate when the standard 
regression model (with fixed-effects) can not sufficiently explain the data. For example, 
when representing dependent data arising from related individuals or when data are 
gathered over time on the same individuals. In that case, the mixed-effects regression 
model is more appropriate as it includes both fixed-effects and random-effects terms. In 
the linear mixed-effects regression model, the x 1 response y, = (yn, ..., yi mi ) T is 
modeled as: 

Yi = Xj/3 + Tjbj + (1) 

where the px 1 vector (3 is the usual unknown fixed-effects regression coefficients vec¬ 
tor describing the population mean, bj is a q x 1 vector of unknown subject-specific 
regression coefficients corresponding to individual effects, independently and identically 
distributed (i.i.d) according to the normal distribution J\f( n Rj) and independent from 
the rrii x 1 error terms e* which are distributed according to A/"(0, £*), and X* and T* are 
respectively rojXp and ro,xg known covariate matrices. A common choice for the noise 
covariance-matrix is to take a diagonal matrix Ej = cr 2 I m . where I m . denotes the m* x m; 
identity matrix. Thus, under this model, the joint distribution of the observations y. 
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and the random effects bj is the following joint multivariate normal distribution (see for 
example Xu and Hedeker (2001)): 


’ y* 

~ a r ( 

Xj/3 + T i[i i 


o'"! mi + TjRjTJT T j R, 

_ b * . 
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Then, from (2) it follows that the observations y* are marginally distributed according to 
the following normal distribution (see Verbeke and Lesaffre (1996) and Xu and Hedeker 
( 2001 )): 


/(yj|Xj, T,; ’2') — A/"(y.j; Xj/3 + Tj/Xj, cr 2 I mi + TjRjTj 1 ). (3) 

2.2 Mixture of regressions with mixed-effects 

The regression model with mixed-effects (1) can be integrated into a finite mixture 
framework to deal with regression data arising from a finite number of groups. The 
resulting mixture of regressions model with linear mixed-effects (Verbeke and Lesaffre, 
1996; Xu and Hedeker, 2001; Celeux et ah, 2005; Ng et ah, 2006) is a mixture model where 
every component k (k — 1,..., K ) is a regression model with mixed-effects given by (1), 
K being the number of mixture components. Thus, the observation y* conditionally on 
each component k is modeled as: 

Yi — ~X-i(3 k + Tjb^ + e ik (4) 

where /3 fe , b^ and beik are respectively the the the fixed-effects regression coefficients, 
the random-effects regression coefficients for individual i, and the error terms, for com¬ 
ponent k. The random-effect coefficients hi k are i.i.d according to A/"(/x fci , R^) and are 
independent from the error terms e lk which follow the distribution A/"(0, cr 2 I m J. Let Zj 
denotes the categorical random variable representing the component memebership for 
the ztli observation. Thus, conditional on the component Zj = k , the observation y^ and 
the random effects b, have the following joint multivariate normal distribution: 

’ y* 

. b * 

and thus the observation y* are marginally distributed according to the following normal 
distribution (see Verbeke and Lesaffre (1996) and Xu and Hedeker (2001)): 

/(yi|Xj, Tj, Zj = k-^k) = A/"(y*; Xj/3 fc + T,;RfcjTf + <j^l mi ). (6) 
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The unknown parameter vector of this component-specific density is given by: 

&k = (Pi, <?l, nl 1} ph, vech(R fcl ) r ,..., vech(R fcn ) T ) T 

where vech is the half-vectorization operator which produces the lower triangular portion 
of the symmetric matrix it operates on. Tims, the marginal distribution of y* uncondi¬ 
tional on component memberships is given by the following mixture distribution: 

K 

f (yj|Xj, Tjj &) = X z f3 h + TT;R/ ;T/ + a 2 k l m .) (7) 

k =1 

where the 7i k s given by ir k = P(Zj = k) for k = 1,K represent the mixing propor¬ 
tions which are non-negative and sum to 1. The unknown mixture model parameters 
given by the parameter vector 

* = (7r 1 ,...,7TK_ 1 ,»Ff,...,!^) T 

where is the parameter vector of component k, are usually estimated, given an i.i.d 
sample of n observations, by maximizing the observed-data log-likelihood 

n K 

log L(&) = X] lo S 7Tk + T iVki, TiR^Tf + (8) 

i =1 k =1 

via the expectation-maximization (EM) algorithm (Dempster et ah, 1977; McLachlan 
and Krishnan, 2008; Verbeke and Lesaffre, 1996; Xu and Hedeker, 2001; Celeux et ah, 
2005; Ng et ah, 2006). 

2.3 Mixtures of spatial spline regressions with mixed-effects 

For spatial regression data, Nguyen et ah (2014) introduced the spatial spline regression 
with liner mixed-effects (SSR). The model is given by (1) where the covariate matrices, 
which are assumed to be identical in Nguyen et ah (2014), that is, T* = X,; and denoted 
by Si, in this spatial case, represent a spatial structure are calculated from the Nodal 
Basis Function (NBF) (Malfait and Ramsay, 2003). Note that in what follows we will 
denote the number of columns of Sj by d. The NBF idea is an extension of the B-spline 
bases used in general for univariate or multivariate functions, to bivariate surfaces and 
was first introduced by Malfait and Ramsay (2003) and then used namely in Ramsay 
et ah (2011) and Sangalli et ah (2013) for surfaces. 

In Nguyen et ah (2014), it is assumed that the random-effects are centered with 
isotropic covariance matrix common to all the individuals, that is b* ~ A/"(0, £ 2 I m .). 
Thus, from (6) it follows that under the spatial spline regression model with linear 
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mixed-effects, the density of the observation y t is given by 

/(y i |S,;S') =A^( yi ;S,/3,{ 2 S, : Sr + <7 2 I m ,). (9) 

It follows that under the mixture of spatial spline regression models with linear mixed- 
effects, the density of y, ; is given by: 

K 

/MS.; f) = Yl s,0 t ,&s,sj + (io) 

k =l 

where the model parameter vector is given by: 

^ = (tTi, • • • , 7T/\-l; P\ , ■ ■ ■ , Pki a lt ■ ■ ■ i °Ki £l i ■ ■ • j Ck) ■ 

Both of models are fitted by using the EM algorithm. In particular, for the mixture 
of spatial spline regressions, the EM algorithm maximizes the following observed-data 
log-likelihood: 


n K 

logL(S') =^log^^AO(y,;SA.eIS i Sf + oll m ). (11) 

i= 1 k =1 

More details on the EM developments for the two models can be found in detail in 
Nguyen et al. (2014). Note that Nguyen et al. (2014) assumed a common noise variance 
a 2 for all the mixture components in (10) and hence in (11). 

3 Bayesian spatial spline regression with mixed-effects 
(BSSR) 

We introduce a Bayesian probabilistic approach to the spatial spline regression model 
with mixed-effects presented in Nguyen et ah (2014) in a maximum likelihood context. 
The proposed model is thus the Bayesian spatial spline regression with linear mixed- 
effects (BSSR) model. We first present the model, the parameter distributions and then 
derive the Gibbs sampler for parameter estimation. 

3.1 The model 

The Bayesian spatial spline regression with mixed-effects (BSSR) model is defined by: 

y* = Si (/3 + bi) + e* (12) 


7 


where the model parameters in this Bayesian framework are assumed to be random vari¬ 
ables with specified prior distributions, and the spatial covariates matrix S* is computed 
from the Nodal basis functions. We first describe the Nodal basis functions and then 

continue the model formulation derivation. 

Introduced by Malfait and Ramsay (2003), the idea of Nodal basis functions (NBFs) 
extends the use of B-splines for univariate function approximation (Ramsay and Silver- 
man, 2005), to the approximation of surfaces. For a fixed number of basis functions d, 
defined on a regular grid with regularly spaced points c(l) (l = 1,..., d) of the domain we 
are working on, with d defined as d = d\d 2 where d\ and d 2 are respectively the columns 
and rows number of nodes, the Ah surface can be approximated using piecewise linear 
Lagrangian triangular finite element NBFs constructed as (e.g see Sangalli et al. (2013); 
Nguyen et al. (2014)): 
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dl ol 

• f _ I f X C ^ . $2 . S 1 C 2 - S 2 Ci . 

II X e < \xi,X 2 ) : ci — 01 < x\ < ci, — Xl H--- < x 2 < c 2 > 
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(13) 
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where x tJ = (x tJ i , Xjj 2 ) are the two spatial coordinates of y t j. c = (ci, C 2 ) denotes a node 
center parameter and hi and hi are respectively the vertical and horizontal shape param¬ 
eters representing the distances between two consecutive centers. Thus, this construction 
leads to the following rrii x d spatial covariates matrix: 


Si 


/ s(xi;ci) 
s(x 2 ; ci) 


s(xi; c 2 ) 
s(x 2 ; c 2 ) 


s(xi; c d ) \ 
s(x 2 ; c d ) 


\s(x m .;ci) s(x mi ; c 2 ) 


s(x m .;c d )/ 


(14) 


where s(x; c) is a shortened notation of the NBF s(x, c,hi,h 2 ) (the shape parameters 
hi and h 2 being constant). An example of a NBF function defined on the rectangular 
domain (xi,a; 2 ) G [—1,1] x [—1,1] with a single node c = (0,0) and hi = h 2 = 1 is 
presented in the Figure 1. 
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Figure 1: Nodal basis function s(x, c, 5 2 ), where c = (0, 0) and 8\ — S 2 — 1. 

The model parameters of the proposed Bayesian model, which are given by the pa¬ 
rameter vector & = (/3 T , a 2 , bi,..., b n , £ 2 ) T are assumed to be unknown random vari¬ 
ables with the following prior distributions. We use conjugate priors for ease as those 
mostly used priors in the literature for example as in (Diebolt and Robert, 1994)Richard¬ 
son and Green (1997) (Stephens, 2000). The used priors for the parameters are as follows: 


N (Mo> So) 
AA(O d ,e 2 I,) 
IG(ao, bo) 
IG(g 0l h 0 ) 



(15) 


where (/x 0 , S 0 ) are the hyper-parameters of the normal prior over the fixed-effects co¬ 


efficients, £ 2 is the variance of the normal distribution over the random-effect coefficients, 


ao and bo (respectively go and ho) are respectively the shape and scale parameters of the 
Inverse Gamma (IG) prior over the variance £ 2 (respectively a 2 ). Figure 2 shows the 
graphical representation of the proposed BSSR model for a set of homogeneous functions 
(yi. • • - ,yn)- 
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Figure 2: Graphical representation of the proposed BMSSR model 

We use MCMC sampling for the Bayesian inference of the model. MCMC sampling 
is indeed one the most commonly used inference techniques in Bayesian analysis of 
mixtures, in particular the Gibbs sampler (e.g see Diebolt and Robert (1994)). 

3.2 Bayesian inference using Gibbs sampling 

In order to implement the Gibbs sampler, we first derive the full conditional posterior 
distributions of the model parameters. Due to the chosen conjugate hierarchical prior 
(15) presented in the previous section, the full conditional posterior distributions can 
then be found analytically as shown in detail in the Appendix A. The full conditional 
distribution of each of the model parameter are given in the following subsections. We 
use the notation |... to denote a conditioning of the parameter in question on all the 
other parameters and the observed data. 

3.2.1 Full conditional distribution of the fixed-effects coefficient vector (3 

Applying the Bayes theorem to the joint distribution leads to the following posterior 
over the fixed-effects regression coefficients /3: p(/3 1...) = p(Y\/3, B, a 2 )p({3). Thus, the 
/3’s posterior distribution is given by the following normal distribution: 

/3|...~A/>o,V 0 ) (16) 
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with 


1 n 

V„-‘ = 

i =1 

= V 0 ^ (y* - Sjbj) - So Vo j • 

3.2.2 Full conditional distribution of the random-elfect coefficient vector bj 

By using the same reasoning as for the fixed-effects regression coefficients, the posterior 
of the random-effects coefficients is calculated as: p(bj|...) = p(y* |/3, bj, a 2 )p(bj|£ 2 ) and 
is thus given by the following normal distribution: 

b i |...~JV'(i/ 1 ,V 1 ) (17) 

with: 

V^ 1 = — S T S- -I- — 

Vi = V, (isf(y, - Si/3)V 

3.2.3 Full conditional distribution of the noise variance a 2 

For the noise variance a 2 which has an inverse Gamma prior, the posterior given by 
p(a 2 1...) = p(Y\/3, B, a 2 )p(a 2 ) is the following inverse Gamma distribution: 

a 2 \...^IG(g 1 ,h 1 ) (18) 


with 


n 

9o + 2 ’ 

, , EIU (y* - W - s^f ( y< - s l/3 - s lbj ) 

ho + - 2 - 

3.2.4 Full conditional distribution of the random-effect variance £ 2 

The same reasoning is used to derive the posterior of the random-effect variance £ 2 . The 
posterior distribution for the parameter is given by p(£ 2 |...) oc p(B|£ 2 )p(£ 2 ) which leads 
to the following posterior inverse Gamma distribution: 

£ 2 | (<*!,&!) ( 19 ) 


9i = 
hi = 
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with: 
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a o + -yi 
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n=i bf b, 


Algorithm 1 summarizes the implementation of the Gibbs sampler for the proposed 
BMSSR model. Each sample of the Gibbs sampler is drawn from the above posterior 
distributions. 

Algorithm 1 Gibbs sampling for the Bayesian spatial spline regression model with 
mixed-effects (BMSSR) 

Inputs: The observations Y = (y l7 ..., y n ) and the spatial spline regression matrices 
(S l7 ...,S n ) 

Initialize: 

fix the model hyper parameters: (/x 0 , E 0 , g 0 , h 0 , a 0 , b 0 ) 
initialize the model parameters: (/3^°\ a 2 ^ 0 ) 

for t — 1 to ^Gibbs samples do 


1 . Sample the random-effects variance: £ 2 * ~ IG ( ao + f , fro + 

2. Sample the noise variance 


E"=i 


(t-i) T , (t-n 


a 2{t) ~IG\g 0 + %,h 0 + 


Er=i(yi-s^ (t - 1) -s i bf- 1) ) T (y i -s i /3(*- 1 )-s i bf- 1) ) 


3. Sample the fixed-effects coefficients vector /3^ ~ J\T(vq\ V^) with 

V-1(*) — V" 1 i 1 X^ n cTc 

v o —^o + z^j=i 

d'^vo^ELisny.-Siby 1 ’: 

for i = 1 to n do 

4. Sample the random-effects coefficients vector rs_/ with 

\T- 1(*) _ 1 CTC i i 

V 1 ~ ff 2(i) a i ? 2(t) 

= Vi (^WSf(y. - 

end for 
end for 


4 Bayesian mixture spatial spline regressions with mixed- 
effects (BMSSR) 

The BMSSR model presented previously is dedicated to learn from a single or a set 
of homogeneous spatial functional data. However, when the data present a natural 
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grouping aspect, this may be restrictive, and its extension to accommodate clustered 
data is needed. We therefore integrate the BMSSR model into a mixture framework. 
This is mainly motivated by a clustering prospective. The resulting model is there¬ 
fore a Bayesian mixture of spatial spline regression with mixed-effects (BMSSR) and is 
described in the following section. 


4.1 The model 

Consider that there are K sub-populations within the data set Y = (y l5 ..., y n ). The 
proposed BMSSR model has the following stochastic representation. Conditional on 
component /c, the individual y, is modeled by a BSSR model as: 

y* = Sj(/3 fc + bjfc) + &ik. (20) 


Thus, a K component Bayesian mixture of spatial spline regression models with mixed- 
effects (BMSSR) has the following density: 

K 

/(YilSi;*) = Af (yi-,Si(f3 k + b ik ),allm z ) (21) 

k =1 

where the parameter vector of the model is given by 


^ = (7T!, . . . , 7T K —1 , 01 ,..., Bf ,..., B£, of,..., <t 2 k , ...,£ 


K) > 


B fc = (Rf fc ,..., b^ fc ) T being the vector of the random-effect coefficients of the kth BSSR 
component. 

The BMSSR model is indeed composed of BSSR components, each of them has pa¬ 
rameters tf'fc = (0%, Bj[, ol, C,l) T and a mixing proportion parameter 7 r*,. Therefore, 
conditional on component k, the parameter priors are defined as in the BSSR model 
presented (15) in the previous section. For the BMSSR model, we therefore just need 
to specify the distribution on the mixing proportions t r = (7r 1; ..., iT k) which follow the 
Multinomial distribution in the generative model of the non-Bayesian mixture. We use 
a conjugate prior as for the other parameters, thats is, a Dirichlet prior with hyper¬ 
parameters ol = («!,. .. ,Or-). The hierarchical prior from for the BMSSR model pa¬ 
rameters is therefore given by: 


7r ~ Dir(ai ,..., olk) 

Pk ~ AA(/3 fc |/x o ,S 0 ) 

bifc|£f ~ A/"(bj fc |0d, £|I<f) 

e k ~ IG(ti\a 0 ,b 0 ) 

ol ~ IG(o 2 k \g 0l h 0 ). 


( 22 ) 
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Figure 3 shows the graphical representation of the proposed BMSSR model for a set of 
heterogeneous functions (yi,..., y n ). 


a 



Figure 3: Graphical representation of the proposed BMSSR model. 

4.2 Bayesian inference using Gibbs sampling 

In this section we derive the full conditional posterior distributions needed for the Gibbs 
sampler to infer the model parameters. Further mathematical calculation details for 
these posterior distributions are given in Appendix B. Consider the vector of aug¬ 
mented parameters, which is the vector of parameters (7 t t , f3 T , B 7 , cr 21 , $, 2T ) T where 
7T = (t r ll ... I 7r*) r , (3 = 0 3l...,(3 T K ) T , <x 2 = (o*,...,o* k ) t , and = (£ 2 ,..., &) T , 
augmented by the unknown components labels z = (zi,, z n ) and the data Y. Let us 
also introduce the binary latent component-indicators Zik such that z t k = 1 iff Zi = k, Zi 
being the hidden label of the mixture component from which the All observation is gen¬ 
erated. Similarly to the case of Bayesian multivariate Gaussian mixtures, the posterior 
distributions of the allocation variables z and the mixing proportions 7r are Multinomial 
and Dirichlet, and are as follows (see for example (Robert, 2007, Section 6.4). 

4.2.1 Full conditional distributions of the discrete indicator variables z 

The posterior distributions of the allocation variables zis given by the following Multi¬ 
nomial distribution with parameters the posterior probabilities of the component labels 
Zi, that is: 

Zi\... ~ Mult(l; Tn,, Tix) (23) 
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with Tjfc (1 < k < K) the posterior probability that the zth observation is issued from 
mixture component k: 


T ik = P (Zi = k\yi, S*; \P) 


n k AT (yj|Sj(/3 fc + b ifc ), ^I m .) 
Eti M (y*|Si(A + b a ),tfl m .) 


(24) 


4.2.2 Full conditional distribution of the mixing proportions 7r 

The mixture proportions, which have a Dirichlet prior of parameter a, have the following 
Dirichlet posterior distribution: 


7r|... ~ Dir (ai + ni ,,.., a K + n K ) 


(25) 


with n k = E" =1 z ik being the number of observations originated from component k. 


4.2.3 Full conditional distribution of the fixed-effects coefficient vectors (3 k 


The full conditional posterior distribution of the fixed-effects coefficient vector (3 k , which 
has normal prior distribution, is obtained By applying the Bayes theorem to the joint dis¬ 
tribution and leads to the following normal posterior distribution p(/3 k \...) = p(Y\(3 k , b*,, a k , z)p(/3 k ) 
which is specified as: 


where 


(3 k \... ~ .A/Vo,V 0 ) 


v ,- 1 = + 


^ 2=1 


^0 = 


Vo ( — V] z ik sf (y i - Sib ik ) - E 0 1 
\ erf ^ 


Mo 


& i =1 


(26) 


(27) 


4.2.4 Full conditional distribution of the random-effects coefficient vectors 

b ik 

The posterior distribution over the random-effects coefficients b lk is computed similarly 
and is given by the following posterior normal distribution thanks to the conjugate 
normal prior: p(b ifc |...) = p(y*|/3 fe , b ik , cr k , Zi = k)p(b ik \£ k ), that is: 

bik\— ~ V(^i, Vi) (28) 
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where 


V 


-l 

i 


V\ 


_LcTc. -I- — 

o ' s o ’ 


crt 


$ 


Vi [-zSfiyi-Sifa 


ot 


4.2.5 Full conditional distribution of the noise variances a'l 

The Inverse Gamma prior on er 2 leads to the following posterior p(cr||...) = p(Y\(3, hk, <7%, z)p(cr^), 
which is also an Inverse Gamma distribution given by: 


<7*1- ~ IG{g iA) 


(29) 


with: 


fh — 90+- Zik, 


i= 1 

^7i / 0/3 ni \T 


hi — ho + 


Er=i ^ (y* s * b *fc) (y* - s */3fc - s t b 


ik ) 


4.2.6 Full conditional distribution of the random-effects variances 

The same reasoning is used to derive the posterior of the random-effect variances 
for which the prior is an Inverse Gamma. The posterior is in this case calculated as 
|...) = p(bfc|£|)p(£k) and is given by the following Inverse Gamma distribution: 

f*|- ~ IG( ai M) (30) 


with 


n 

0\ = o-o + 
h , , EtibSbi* 

»i — % H--- 

The pseudo-code 2 summarizes the Gibbs sampler to infer the parameters of the 
proposed Bayesian mixture of spatial spline regressions with mixed-effects (BMSSR). 
Each Gibbs sample is drawn from the above posterior distributions. 
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Algorithm 2 Bayesian mixture of spatial spline regressions with mixed-effects 
(BMSSR). 

Inputs: The observations Y = (yi,..., y n ) and the spatial spline regression matrices 
(Si,..., S n ) and the number of mixture components K 

Initialize: 

the model hyper parameters: (a, /x 0 , S 0 , go, h 0l ao, bo) 
the model parameters: (n, (3, B, , cr 2 ) 
for t = 1 to ^Gibbs samples do 

for i = 1 to n do 

1. Sample the allocation variables: zf^ ~ Mult(l; , rjj}) with the posterior 

u viv (t) , , + ^ a - 4 - (*) 7r fc* ) • A/ '(yi|S i (/3^ ) +bW),<7 2 ^ ) I mi ) 

probabilities ta. calculated according to rA. = —=— ,,, , - ttt—ttt -r-G— r 

E*i^ () ^(yilSiOa^+bWj^Wi™.) 

end for 

2. Sample the mixing proportions: 7r^ ~ Dir(ai + nf \ ..., ax + n'x ) with nj^ = 


Z^j=i Rfc 

for k — 1 to A do 

3. Sample the random-effects variance: 

IG a 0 + 


t2(b 

Sfc 


2 ’ u 1 2 
4. Sample the noise variance: 


cr 2 ^ ~ IG [ go + ho + 2 


Ef=i z ik (yi-Si/3(. t 1) -Sib(* 1) ) (yj-Si/3 




5. Sample the fixed-effects coefficient vector: (3^) ~ M(vq\ Vq with 

_sr^ n J)o To. 

2 PI Z_/i=l ~ik ‘A 


-y — lb) 1 | 1 ^b)c T 


vf = Vo (i 5 w Efc=i 4 ’sf (yi - S.b'r 1 ’) - s 0 -Vo 

for i = 1 to n do 

6. Sample the random-effects coefficient vector: h-' f j ~ J\f(ui , Vf ) with 
v -ib) _ i cTc , i 


V? = V! 

end for 
end for 
end for 


k 4 

1 C Tt 


t 2FJ 

T k 


Sf(y. - Si/3 


(*)' 
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4.3 Model-based surface clustering using the BMSSR 

In addition to Bayesian density estimation, The BMSSR model can also be used for 
Bayesian model-based surface clustering so that to provide a partition of the data into 
K clusters. Model-based clustering using the BMSSR model consists in assuming that 
the observed data {Si,Yi}U are generated from a K component mixture of spatial 
spline regressions with mixed-effects with parameter vector ^. The mixture components 
can be interpreted as clusters and hence each cluster can be associated with a mixture 
component. The problem of clustering therefore becomes the one of estimating the 
BMSSR parameters l P This is performed here by Gibbs sampling which provides a 
MAP estimator which can be obtained by averaging the Gibbs posterior sample 

after removing some initial samples corresponding to a burn-in period. A partition of the 
data can then be obtained from the posterior memberships by applying the MAP rule, 
that is, by maximizing the posterior cluster probabilities (24) to assign each observation 
to a cluster: 


K IT ^ 

Zi = arginaxT i fc(«f'MAp) 

k =1 

where z % represents the estimated cluster label for the ztli observation. 


(31) 


5 Application to simulated data and real data 

In this section we apply the two proposed Bayesian models 1 on simulated data and 
real data. We first consider simulated surfaces to test the model in terms of surface 
approximation. Then, we apply it on a handwritten character recognition problem by 
considering real images from the MNIST data set (LeCun et ah, 1998) to test it in terms 
of surface approximation and clustering. 


5.1 Simulated surface approximation using the BSSR model 


sin(- v /l + x\ + x\) 

y/TT 


and we 


We consider the bi-dimensional arbitrary function /i(x) = 

' '* ' I 

J l i x 2 

attempt to approximate it from a sample of simulated noisy surfaces. We simulate a 
sample of 100 random surfaces y*(i = 1,..., 100) as follows. Each surface y, ; is composed 
of nrii = 21 x 21 observations generated on a square domain (aq,^) 6 [—10,10] x 
[—10,10]. To generate the surface y,, we first add random effects to the mean surface by 
computing /Xj(x) + b, and then y,• is simulated by adding a random error term, that is, 


1 The corresponding algorithms (including the EM alternative) have been written in Matlab and are 
available upon request from the author. 
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y i = /Xj(x) + bj + ej with bj ~ A/"(0, 0.1 2 I m J and e* ~ A^(0, 0.1 2 I m ,). Then, the sample 
of simulated surfaces Y = (y 1; ..., yioo) is approximated by applying the BSSR model. 

Figure 4a shows the actual mean function before the noise and the random effects 
are added, and Figure 4b shows an example of simulated surface. We apply the BSSR 
model with d = 5x5 NBFs and d = 15 x 15 NBFs and show the obtained mean surface 
/t(x) = Si/3 fitted from the whole data set. 

Figure 4c shows the fitted mean surface with d = 5x5 NBF basis, while Figure 4d 
shows its analogous with d = 15 x 15 NBFs. 



Figure 4: True surface (a), an example of noisy surface from the simulated sample (b), 
A BSSR fit using 5x5 NBFs (c) and 15 x 15 NBFs (d). 


ft can be seen that for the two cases, the approximated surface resembles the actual 
one. In particular, the second approximation, using a reasonable number of basis func¬ 
tions, is very close to the true surface. This is confirmed by the value of the empirical sum 
of squared error between the true surface and the fitted one SSE = Xq=i(/b'( x ) — /b'( x )) 2 
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(m = 441 here), which equal 0.0865 in this case and which corresponds to a very rea¬ 
sonable fit. 

5.2 Handwritten digit clustering using the BMMSSR model 

In this section we apply the BMSSR model on a subset of the ZIPcode data set Hastie 
et al. (2010), which is issued from the MNIST data set (LeCun et ah, 1998). The data 
set contains 9298 16 by 16 pixel gray scale images of Hindu-Arabic handwritten numerals 
distributed as in the following table 1. 


digit 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0 

training set 

1005 

731 

658 

652 

556 

664 

645 

542 

644 

1194 

testing set 

264 

198 

166 

200 

160 

170 

147 

166 

177 

359 


Table 1: Zipcode data set digits distribution 


Each individual y, contains m* = 256 observations y,j = (yn ,..., yi 25 e) T values in the 
range [—1,1]. We run the Gibbs sampler given by algorithm 2 with a number of clusters 
K = 8,..., 12 on a subset of 1000 digits randomly chosen from the Zipcode testing set 
with the distribution given in table 2, 


digit 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0 

00 

108 

105 

96 

100 

107 

94 

106 

90 

97 

97 

K = 9 

97 

107 

107 

100 

103 

112 

88 

98 

92 

96 

I< = 10 

97 

90 

100 

98 

107 

107 

102 

107 

97 

98 

I< = 11 

105 

104 

99 

96 

95 

106 

101 

93 

107 

94 

I< = 12 

111 

96 

96 

105 

108 

96 

97 

99 

97 

98 


Table 2: Repartion of used Zipcode random subsets of 1000 digits 


We used d = 8x8 NBFs, which corresponds to the quarter of the resolution of the 
images in the Zipcode data set. We performed five runs of of the algorithm, each for a 
different model: K = 8,..., 12. The corresponding mean Adjusted Rand Index (ARI) 
values are given in Table 3. The model with 12 clusters has the highest ARI value. 


K 

8 

9 

10 

11 

12 

ARI 

0.4848 

0.4694 

0.4445 

0.5139 

0.5238 


Table 3: ARI for the BMSSR model for a number of clusters K = 8,..., 12. 
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Figure 5 shows the cluster means for K = 12 obtained by the proposed Baysian 
model (BMSSR). It clearly shows that the model is able to recover the ten digits as well 
as subgroups of the digit 0 and the digit 5. 



Figure 5: Cluster means obtained by the proposed BMSSR model for with K = 12 
components. 


6 Conclusion and future work 

We presented a probabilistic Bayesian model for homogeneous spatial data based on 
spatial spline regression with mixed-effects (BSSR). The model is able to accommodate 
individual with both fixed and random effect variability. Then, motivated by a model- 
based surface clustering perspective, we introduced the Bayesian mixture of spatial spline 
regressions with mixed-effects (BMSSR) for spatial functional data dispersed into groups. 
We derived Gibbs samplers to infer the models. Application on simulated surfaces 
illustrates the surface approximation using the BSSR model. Then, application on real 
data in a handwritten digit recognition framework shows the potential benefit of the 
proposed BMSSR model for practical applications on surface clustering. 

The BMSSR can be extended to be used for supervised surface classification. This can 
be performed without difficulty by modeling each class by a BMSSR model and then 
applying the Bayes rule to assign a new observation to the class corresponding to the 
highest posterior probability. 

A future work will therefore consist in conducting additional experiments on real data 
clustering and discrimination as well as model selection using information criteria such 
as BIC and ICL. 

Then, another interesting perspective is to derive a Bayesian non-parametric model by 
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relying of Dirichlet Process mixture models where the number of mixture components 
can be directly inferred from the data. 
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